English-Chinese Cross-Language Information Retrieval using Lucene Toolkit1
نویسندگان
چکیده
In this paper, we present our English-Chinese Cross-Language Information Retrieval (CLIR) system. We focus our attention on finding effective translation equivalents between English and Chinese, and improving the performance of Chinese IR. On English-Chinese CLIR, we adopt query translation as the dominant strategy, and utilize English-Chinese bilingual dictionary as the important knowledge resource to acquire correct translations. On Chinese monolingual retrieval, we investigated the use of different entities as indexes and implement our retrieval system based on the Lucene toolkit. On system evaluation, we present an effective method to generate the sets of relevant documents for query topics.
منابع مشابه
Research on Lucene-based English-Chinese Cross-Language Information Retrieval
In this paper, we present our English-Chinese Cross-Language Information Retrieval (CLIR) system. We focus our attention on finding effective translation equivalents between English and Chinese, and improving the performance of Chinese IR. On English-Chinese CLIR, we adopt query translation as the dominant strategy, and utilize English-Chinese bilingual dictionary as the important knowledge res...
متن کاملUsing Wikipedia and Wiktionary in Domain-Specific Information Retrieval
The main objective of our experiments in the domain-specific track at CLEF 2008 is utilizing semantic knowledge from collaborative knowledge bases such as Wikipedia and Wiktionary to improve the effectiveness of information retrieval. While Wikipedia has already been used in IR, the application of Wiktionary in this task is new. We evaluate two retrieval models, i.e. SR-Text and SR-Word, based ...
متن کاملExploiting the LDC Chinese-English Bilingual Wordlist for Cross Language Information Retrieval
We investigated using the LDC English/Chinese bilingual wordlists for English-Chinese cross language retrieval. It is shown that the Chinese-to-English wordlist can be considered as both a phrase and word dictionary, and is preferable to the English-to-Chinese version in terms of phrase translation and word translation selection. Additional techniques such as frequency-based term selection, tra...
متن کاملEnglish-Chinese CLIR using a Simplified PIRCS System
A GUI is presented with our PIRCS retrieval system for supporting English-Chinese cross language information retrieval. The query translation approach is employed using the LDC bilingual wordlist. Given an English query, different translation methods and their retrieval results can be demonstrated.
متن کاملPhrasal Translation for English-Chinese Cross Language Information Retrieval
This paper introduces a simple and effective nonoverlapping unigram and bigram segmentation method for both monolingual Chinese and English-Chinese cross language retrieval. It also describes English-Chinese cross language retrieval experiments involving 54 topics and some 164,000 documents. The translation of English queries to Chinese is done using a Chinese-English dictionary of about 120,00...
متن کامل